Variant Discovery ◾ 143
--filter-name “ReadPostRankSum-20” \
-O filteredVCF/hardfilteredIndels.vcf
4.3 VISUALIZING VARIANTS
The variants in a VCF file can be visualized on a variant viewer such as IGV (Integrated
Genomics Viewer), which is an open-source program for all platforms. It can be down-
loaded from “https://software.broadinstitute.org/software/igv/download” and installed on
a local computer. Figure 4.7 shows the allele fractions and genotypes for each of the InDels
and SNPs of the samples. The dark blue color indicates heterozygous genotype, cyan indi-
cates homozygous genotype, and gray indicates the same genotype as the reference. Refer
to the documentation of the IGV to read more about this.
4.4 VARIANT ANNOTATION AND PRIORITIZATION
The variant calling using any of the variant callers, such as bcftools, FreeBayes, or GATK,
and variant filtering is followed by variant annotation and prioritization. Variant annota-
tion involves adding information and knowledge to high-confidence variants in an effort
to enhance assessment of variants that are likely to impact functions. Following the work-
flow of variant calling, we will obtain high-quality variants in a single VCF file including
the genotypes of all samples. Since variant discovery usually involves the whole genome
or whole exome of an individual or multiple individuals of a species, thousands of vari-
ants may be discovered. However, we are usually interested in the variants that affect the
function or have associations with diseases or other important phenotypes. Variants may
impact functions in different ways depending on the type of the variants. Variants can be
everywhere on the genome sequence, but the most deleterious and damaging are the ones
that have effect on the function of a gene. A variant may suppress, inhibit, or activate a
gene if it affects the gene regulatory region. This kind of effects are usually seen in cancer
cell in which mutations may lead to the hyperactivity of proto-oncogenes, which accelerate
cell growth and division or inactivation of tumor suppressor genes. Variants which affect
the coding regions of a gene may cause an impact depending on the consequence of the
change. A single-nucleotide variant (SNV) that forms a stop codon will cause a truncated
protein that does not function normally. On the other hand, SNV in a stop codon may lead
to a stop loss that results in a longer protein. A variant in a splicing region may also alter
the sequence and function of the protein. More often, SNVs affect the coding regions of a
gene producing new amino acids that change the characteristic and function of the trans-
lated protein. These SNVs are known as missense SNVs and they are the easiest predictable
variants. But an SNV can also be synonymous in the sense that it changes the codon but it
does not change the amino acid. Although this kind of variant does not change the protein
sequence, it may still have a biological consequence. Insertion or deletion of a single or
multiple nucleotides in the coding region may lead to the frameshift, and hence, the pro-
tein will be translated incorrectly from that point.
The most deleterious variants are stop-gain, frameshift, and splicing region variants
since they lead to loss of function. However, before we decide on a variant effect, we should